Succinct representations of lcp information and improvements in the compressed suffix arrays
نویسنده
چکیده
We introduce two succinct data structures to solve various string problems. One is for storing the information of lcp, the longest common prefix, between suffixes in the suffix array, and the other is an improvement in the compressed suffix array which supports linear time counting queries for any pattern. The former occupies only 2n + o(n) bits for a text of length n for computing lcp between adjacent suffixes in lexicographic order in constant time, and 6n + o(n) bits between any two suffixes. No data structure in the literature attained linear size. The latter has size proportional to the text size and it is applicable to texts on any alphabet Σ such that |Σ| = log n. These space-economical data structures are useful in processing huge amounts of text
منابع مشابه
Sampled Longest Common Prefix Array
When augmented with the longest common prefix (LCP) array and some other structures, the suffix array can solve many string processing problems in optimal time and space. A compressed representation of the LCP array is also one of the main building blocks in many compressed suffix tree proposals. In this paper, we describe a new compressed LCP representation: the sampled LCP array. We show that...
متن کاملLow Space External Memory Construction of the Succinct Permuted Longest Common Prefix Array
The longest common prefix (LCP) array is a versatile auxiliary data structure in indexed string matching. It can be used to speed up searching using the suffix array (SA) and provides an implicit representation of the topology of an underlying suffix tree. The LCP array of a string of length n can be represented as an array of length n words, or, in the presence of the SA, as a bit vector of 2n...
متن کاملOptimal Time and Space Construction of Suffix Arrays and LCP Arrays for Integer Alphabets
Suffix arrays and LCP arrays are one of the most fundamental data structures widely used for various kinds of string processing. Many problems can be solved efficiently by using suffix arrays, or a pair of suffix arrays and LCP arrays. In this paper, we consider two problems for a string of length N , the characters of which are represented as integers in [1, . . . , σ] for 1 ≤ σ ≤ N ; the stri...
متن کاملWee LCP
We prove that longest common prefix (LCP) information can be stored in much less space than previously known. More precisely, we show that in the presence of the text and the suffix array, o(n) additional bits are sufficient to answer LCP-queries asymptotically in the same time that is needed to retrieve an entry from the suffix array. This yields the smallest compressed suffix tree with sublog...
متن کاملLightweight LCP-Array Construction in Linear Time
The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three (compressed) components: the suffix array, the LCP-array, and data structures for simulating navigational operations on the suffix tree. The LCP-array stores the leng...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002